252
Clustering (Cluster Analysis) Statistical procedure to classify (group) objects into
groups (clusters) with similar characteristic structures (characteristics). A distinction is
made between supervised (groups known) and unsupervised clustering (groups unknown).
Code Specification for the unambiguous representation or assignment of characters with
the aid of a given character sequence (e.g. genetic code using base triplets to represent the
20 amino acids).
COGs (clusters of orthologous genes) see last universal common ancestor.
Computers are data processing machines. To this end, they now typically consist of
hardware (electronic switches, transistors, integrated chips) and other parts (input and
output devices, housings, etc.). They process instructions (software) in sequence to gener
ate new results from the data, e.g. calculations, sequences, result lists or networks (typical
results in bioinformatics calculations).
Consensus Sequence Conserved sequence of motifs in a multiple alignment of several
sequences, such as nucleotides of an enzyme (see also PSSM).
Corona virus see Pandemic.
Databases Different databases (software component) integrate and collect biological
data and make it available to the general public over the Internet using a serviceable com
puter (hardware component called a “server”). Databases hold all the data that people look
up. Typically, this is done in many records. Different properties about a particular record
are held in individual data fields. How this looks in detail is determined by the data model.
Finally, the data can be searched using a query (database query). A simple query language
popular in bioinformatics for simple, smaller databases is the “Structured Query
Language” (in short: “SQL”), and such a database is then an SQL database. Important
bioinformatics databases are listed many times in the book, e.g. GenBank (genome and
nucleotide sequence data) and UniProt/Swiss-Prot for protein sequences.
Data-Driven Modeling Normalization of the different units of the bioinformatic model
according to the experimental data, i.e. the typical times of the signaling cascade, receptor
excitation, phosphorylation of kinases, etc. are determined by this.
Dimension Reduction see Principal Component Analysis (PCA). DNA (deoxyribo
nucleic acid, DNA for short)
Biochemically, a mixture of nucleotides that are all connected via a deoxy-ribose sugar
and a phosphate “backbone” to form a long molecule, the DNA single strand.
Bioinformatically centrally important because DNA contains all the genetic material
(hereditary material, also called the genome) and thus all the hereditary information of an
organism. The DNA single strand pairs on its own with its counterpart strand, so that DNA
18 Glossary